library(tidyverse)
library(ISLR)
library(MASS)
Here, I’ll be using the Auto dataset.
data("Auto")
attach(Auto)
fit = lm(mpg ~ horsepower)
summary(fit)
##
## Call:
## lm(formula = mpg ~ horsepower)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
Is there a relationship between the predictor and the response?
The results show that there is a negative relationship between mpg (miles per gallon) and horsepower. For every one unit increase in horsepower, miles per gallon decreases by 0.157 units.
How strong is the relationship between the predictor and the response? Based on the R squared, which over 60%, we can say there’s a strong relationship between mpg and horsepower.
What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?
predict(fit, data.frame(horsepower = 98), interval = "confidence")
## fit lwr upr
## 1 24.46708 23.97308 24.96108
predict(fit, data.frame(horsepower = 98), interval = "prediction")
## fit lwr upr
## 1 24.46708 14.8094 34.12476
par(mfrow = c(1,1))
plot(horsepower, mpg)
abline(fit)
par(mfrow = c(2,2))
plot(fit)
The plot of residuals versus fitted values indicates the presence of non linearity in the data. The plot of standardized residuals versus leverage indicates the presence of a few outliers (higher than 2 or lower than -2) and a few high leverage points.
pairs(Auto)
Let’s look at the correlation
cor(Auto[1:8])
## mpg cylinders displacement horsepower weight
## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
## horsepower -0.6891955 -0.4163615 -0.4551715
## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000
fit = lm(mpg ~ . -name, data = Auto)
summary(fit)
##
## Call:
## lm(formula = mpg ~ . - name, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182
## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16
Is there a relationship between the predictors and the response? There appears to be relationship between the predictores and the response since we have F statistic of 252.2 and p-value less than 0.05. At least, one predictor is significant. (non -zero)
Which predictors appear to have a statistically significant relationship to the response? Displacement, weight, year and origin have a statitically significant relationship to the response.
What does the coefficient for the year variable suggest?
The coefficient for the year variable suggest that cars have become more efficient over time.
par(mfrow = c(2,2))
plot(fit)
plot(fit)
a = Auto[1:8]
fit_interaction = lm(mpg ~ .*. , data = a)
summary(fit_interaction)
##
## Call:
## lm(formula = mpg ~ . * ., data = a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6303 -1.4481 0.0596 1.2739 11.1386
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.548e+01 5.314e+01 0.668 0.50475
## cylinders 6.989e+00 8.248e+00 0.847 0.39738
## displacement -4.785e-01 1.894e-01 -2.527 0.01192 *
## horsepower 5.034e-01 3.470e-01 1.451 0.14769
## weight 4.133e-03 1.759e-02 0.235 0.81442
## acceleration -5.859e+00 2.174e+00 -2.696 0.00735 **
## year 6.974e-01 6.097e-01 1.144 0.25340
## origin -2.090e+01 7.097e+00 -2.944 0.00345 **
## cylinders:displacement -3.383e-03 6.455e-03 -0.524 0.60051
## cylinders:horsepower 1.161e-02 2.420e-02 0.480 0.63157
## cylinders:weight 3.575e-04 8.955e-04 0.399 0.69000
## cylinders:acceleration 2.779e-01 1.664e-01 1.670 0.09584 .
## cylinders:year -1.741e-01 9.714e-02 -1.793 0.07389 .
## cylinders:origin 4.022e-01 4.926e-01 0.816 0.41482
## displacement:horsepower -8.491e-05 2.885e-04 -0.294 0.76867
## displacement:weight 2.472e-05 1.470e-05 1.682 0.09342 .
## displacement:acceleration -3.479e-03 3.342e-03 -1.041 0.29853
## displacement:year 5.934e-03 2.391e-03 2.482 0.01352 *
## displacement:origin 2.398e-02 1.947e-02 1.232 0.21875
## horsepower:weight -1.968e-05 2.924e-05 -0.673 0.50124
## horsepower:acceleration -7.213e-03 3.719e-03 -1.939 0.05325 .
## horsepower:year -5.838e-03 3.938e-03 -1.482 0.13916
## horsepower:origin 2.233e-03 2.930e-02 0.076 0.93931
## weight:acceleration 2.346e-04 2.289e-04 1.025 0.30596
## weight:year -2.245e-04 2.127e-04 -1.056 0.29182
## weight:origin -5.789e-04 1.591e-03 -0.364 0.71623
## acceleration:year 5.562e-02 2.558e-02 2.174 0.03033 *
## acceleration:origin 4.583e-01 1.567e-01 2.926 0.00365 **
## year:origin 1.393e-01 7.399e-02 1.882 0.06062 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.695 on 363 degrees of freedom
## Multiple R-squared: 0.8893, Adjusted R-squared: 0.8808
## F-statistic: 104.2 on 28 and 363 DF, p-value: < 2.2e-16
Question 10
data("Carseats")
Fit a multiple regression model to predict Sales using Price, Urban, and US.
attach(Carseats)
fit = lm(Sales ~ Price + Urban + US)
summary(fit)
##
## Call:
## lm(formula = Sales ~ Price + Urban + US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9206 -1.6220 -0.0564 1.5786 7.0581
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.043469 0.651012 20.036 < 2e-16 ***
## Price -0.054459 0.005242 -10.389 < 2e-16 ***
## UrbanYes -0.021916 0.271650 -0.081 0.936
## USYes 1.200573 0.259042 4.635 4.86e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2335
## F-statistic: 41.52 on 3 and 396 DF, p-value: < 2.2e-16
For every one 1 dollar increase in price of car seat, sales decreases by 54.4 dollars on average, adjusting for Urban and US On average, sales in Urban areas is 21.9 dollars less, adusting for Price and US. On average sales in the US is 1200.57 dollars higher adjusting for Price and Urban.
Fitting a model with only the significant variables.
fit1 = lm(Sales ~ Price + US)
summary(fit1)
##
## Call:
## lm(formula = Sales ~ Price + US)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9269 -1.6286 -0.0574 1.5766 7.0515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.03079 0.63098 20.652 < 2e-16 ***
## Price -0.05448 0.00523 -10.416 < 2e-16 ***
## USYes 1.19964 0.25846 4.641 4.71e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared: 0.2393, Adjusted R-squared: 0.2354
## F-statistic: 62.43 on 2 and 397 DF, p-value: < 2.2e-16
The R squared for the smaller model is better.
Confidence intervals
confint(fit)
## 2.5 % 97.5 %
## (Intercept) 11.76359670 14.32334118
## Price -0.06476419 -0.04415351
## UrbanYes -0.55597316 0.51214085
## USYes 0.69130419 1.70984121
evidence of outliers or high leverage observations
plot(fit)
Question 11 In this problem we will investigate the t-statistic for the null hypothesis H0 : β = 0 in simple linear regression without an intercept
set.seed(1)
x = rnorm(100)
y = 2 * x + rnorm(100)
Regression of y on x without the intercept
fit = lm(y ~ x + 0)
summary(fit)
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9154 -0.6472 -0.1771 0.5056 2.3109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 1.9939 0.1065 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9586 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
beta coefficient = 1.9939 Standard error = 0.1065 and t value = 18.73
Regression of x on y without the intercept
fit1 = lm(x ~ y + 0)
summary(fit1)
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.8699 -0.2368 0.1030 0.2858 0.8938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 0.39111 0.02089 18.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4246 on 99 degrees of freedom
## Multiple R-squared: 0.7798, Adjusted R-squared: 0.7776
## F-statistic: 350.7 on 1 and 99 DF, p-value: < 2.2e-16
beta = 0.39111 standard error = 0.02089 t value = 18.73
We obtain the same value for the t-statistic and consequently the same value for the corresponding p-value. Both results in (a) and (b) reflect the same line created in (a). In other words, y=2x+ε could also be written x=0.5(y−ε).
Regression with intercept
summary(lm(y ~ x))
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.8768 -0.6138 -0.1395 0.5394 2.3462
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03769 0.09699 -0.389 0.698
## x 1.99894 0.10773 18.556 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9628 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
summary(lm(x ~ y))
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.90848 -0.28101 0.06274 0.24570 0.85736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.03880 0.04266 0.91 0.365
## y 0.38942 0.02099 18.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4249 on 98 degrees of freedom
## Multiple R-squared: 0.7784, Adjusted R-squared: 0.7762
## F-statistic: 344.3 on 1 and 98 DF, p-value: < 2.2e-16
In this case, the t statistic are equal
Question 12 Under what circumstance is the coefficient estimate for the regression of X onto Y the same as the coefficient estimate for the regression of Y onto X
When the sum of the squares of the observed y-values are equal to the sum of the squares of the observed x-values.
Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is different from the coefficient estimate for the regression of Y onto X
set.seed(1)
x = rnorm(100)
y = 2*x
summary(lm(y ~ x + 0))
## Warning in summary.lm(lm(y ~ x + 0)): essentially perfect fit: summary may
## be unreliable
##
## Call:
## lm(formula = y ~ x + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.776e-16 -3.378e-17 2.680e-18 6.113e-17 5.105e-16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x 2.000e+00 1.296e-17 1.543e+17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.167e-16 on 99 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.382e+34 on 1 and 99 DF, p-value: < 2.2e-16
summary(lm(x ~ y + 0))
## Warning in summary.lm(lm(x ~ y + 0)): essentially perfect fit: summary may
## be unreliable
##
## Call:
## lm(formula = x ~ y + 0)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.888e-16 -1.689e-17 1.339e-18 3.057e-17 2.552e-16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## y 5.00e-01 3.24e-18 1.543e+17 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.833e-17 on 99 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 2.382e+34 on 1 and 99 DF, p-value: < 2.2e-16
Generate an example in R with n = 100 observations in which the coefficient estimate for the regression of X onto Y is the same as the coefficient estimate for the regression of Y onto X.
set.seed(1)
x = rnorm(100)
y = sample(x, 100)
g = data.frame(x = x, y = y)
ggplot(aes(x = g$x, y = g$y), data = g) + geom_point() + geom_line()
sum(x^2)
## [1] 81.05509
sum(y^2)
## [1] 81.05509
summary(lm(y ~ x))
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.32827 -0.60584 0.00216 0.58434 2.29058
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.108130 0.090942 1.189 0.237
## x 0.006955 0.101013 0.069 0.945
##
## Residual standard error: 0.9027 on 98 degrees of freedom
## Multiple R-squared: 4.837e-05, Adjusted R-squared: -0.01016
## F-statistic: 0.00474 on 1 and 98 DF, p-value: 0.9452
summary(lm(x ~ y))
##
## Call:
## lm(formula = x ~ y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.33102 -0.60922 0.00922 0.57929 2.29163
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.108130 0.090942 1.189 0.237
## y 0.006955 0.101013 0.069 0.945
##
## Residual standard error: 0.9027 on 98 degrees of freedom
## Multiple R-squared: 4.837e-05, Adjusted R-squared: -0.01016
## F-statistic: 0.00474 on 1 and 98 DF, p-value: 0.9452
Question 13
x = rnorm(100)
esp = rnorm(100, 0,sqrt(0.25))
y = -1 + 0.5 * x + esp
y is of length 100. β0 is -1, β1 is 0.5
par(mfrow = c(1,1))
plot(x, y)
fit = lm(y ~ x)
summary(fit)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2425 -0.2512 0.0136 0.3502 1.2736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.94976 0.04901 -19.38 <2e-16 ***
## x 0.49066 0.04695 10.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4899 on 98 degrees of freedom
## Multiple R-squared: 0.527, Adjusted R-squared: 0.5222
## F-statistic: 109.2 on 1 and 98 DF, p-value: < 2.2e-16
beta0 hat = - 0.98 and beta1 hat = 0.55 The linear regression fits a model close to the true value of the coefficients as was constructed. The model has a large F-statistic with a near-zero p-value so the null hypothesis can be rejected.
Display the least squares line on the scatterplot obtained in (d). Draw the population regression line on the plot, in a different color. Use the legend() command to create an appropriate legend.
plot(x, y)
abline(fit, lwd = 3, col = 2)
abline(-1, 0.5, lwd = 3, col = 3)
legend(-1, legend = c("model fit", "pop. regression"), col = 2:3, lwd = 3)
Now fit a polynomial regression model that predicts y using x and x2. Is there evidence that the quadratic term improves the model fit?
fit1 = lm(y ~ x + I(x^2))
summary(fit1)
##
## Call:
## lm(formula = y ~ x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.2428 -0.2535 0.0137 0.3485 1.2712
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.947344 0.060802 -15.581 <2e-16 ***
## x 0.490788 0.047230 10.392 <2e-16 ***
## I(x^2) -0.002217 0.032762 -0.068 0.946
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4925 on 97 degrees of freedom
## Multiple R-squared: 0.5271, Adjusted R-squared: 0.5173
## F-statistic: 54.05 on 2 and 97 DF, p-value: < 2.2e-16
There is evidence that model fit has increased over the training data given the slight increase in R2 and RSE. Although, the p-value of the t-statistic suggests that there isn’t a relationship between y and x2.
Repeat after modifying the data generation process in such a way that there is less noise in the data. The model (3.39) should remain the same. You can do this by decreasing the vari- ance of the normal distribution used to generate the error term ε
x1 = rnorm(100)
esp1 = rnorm(100, 0, 0.1)
y1 = -1 + 0.5 * x1 + esp1
par(mfrow = c(1,1))
plot(x1, y1)
fit1 = lm(y1 ~ x1)
summary(fit1)
##
## Call:
## lm(formula = y1 ~ x1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.25031 -0.07919 0.00240 0.05670 0.38216
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.00058 0.01101 -90.92 <2e-16 ***
## x1 0.50505 0.01052 47.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1098 on 98 degrees of freedom
## Multiple R-squared: 0.9592, Adjusted R-squared: 0.9588
## F-statistic: 2303 on 1 and 98 DF, p-value: < 2.2e-16
Repeat after modifying the data generation process in such a way that there is more noise in the data. The model (3.39) should remain the same. You can do this by increasing the variance of the normal distribution used to generate the error term ε
x2 = rnorm(100)
esp2 = rnorm(100, 0, 1)
y2 = -1 + 0.5 * x2 + esp2
par(mfrow = c(1,1))
plot(x2, y2)
fit2 = lm(y2 ~ x2)
summary(fit2)
##
## Call:
## lm(formula = y2 ~ x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.68380 -0.85153 -0.09211 0.91308 2.17018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.0879 0.1107 -9.823 2.93e-16 ***
## x2 0.7261 0.1063 6.829 7.23e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.103 on 98 degrees of freedom
## Multiple R-squared: 0.3224, Adjusted R-squared: 0.3155
## F-statistic: 46.63 on 1 and 98 DF, p-value: 7.23e-10
What are the confidence intervals for β0 and β1 based on the original data set, the noisier data set, and the less noisy data set?
confint(fit)
## 2.5 % 97.5 %
## (Intercept) -1.0470067 -0.8525044
## x 0.3974859 0.5838434
confint(fit1)
## 2.5 % 97.5 %
## (Intercept) -1.0224169 -0.9787372
## x1 0.4841616 0.5259337
confint(fit2)
## 2.5 % 97.5 %
## (Intercept) -1.3076392 -0.8680943
## x2 0.5150973 0.9371055
All intervals seem to be centered on approximately 0.5, with the second fit’s interval being narrower than the first fit’s interval and the last fit’s interval being wider than the first fit’s interval.
Question 14 - Problem of collinearity
set.seed(1)
x1 = runif(100)
x2 = 0.5 * x1 + rnorm(100)/10
y = 2 + 2 * x1 + 0.3 * x2 + rnorm(100)
The last line corresponds to creating a linear model in which y is a function of x1 and x2
What is the correlation between x1 and x2? Create a scatterplot displaying the relationship between the variables.
cor(x1, x2)
## [1] 0.8351212
correlation = 0.8351212
plot(x1, x2)
Using this data, fit a least squares regression to predict y using x1 and x2
fit = lm(y ~ x1 + x2)
summary(fit)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8311 -0.7273 -0.0537 0.6338 2.3359
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
## x1 1.4396 0.7212 1.996 0.0487 *
## x2 1.0097 1.1337 0.891 0.3754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
## F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
The coefficients β̂0, β 1 and β̂2 are respectively 2.1304996, 1.4395554 and 1.0096742. Only β̂ 0 is close to β0. As the p-value is less than 0.05 we may reject H0 for β1, however we may not reject H0 for β2 as the p-value is higher than 0.05.